{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "provenance": [],
      "toc_visible": true
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    },
    "language_info": {
      "name": "python"
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "source": [
        "# **Elliptic++ Transactions Dataset**\n",
        "\n",
        "\n",
        "---\n",
        "---\n",
        "\n",
        "\n",
        "Released by: Youssef Elmougy, Ling Liu\n",
        "\n",
        "\n",
        "\n",
        "School of Computer Science, Georgia Institute of Technology\n",
        "\n",
        "Contact: yelmougy3@gatech.edu\n",
        "\n",
        "\n",
        "---\n",
        "\n",
        "Github Repository: [https://www.github.com/git-disl/EllipticPlusPlus](https://www.github.com/git-disl/EllipticPlusPlus)\n",
        "\n",
        "\n",
        "If you use our dataset in your work, please cite our paper:\n",
        "\n",
        "\n",
        "\n",
        "\n",
        "\n",
        ">> Youssef Elmougy and Ling Liu. 2023. Demystifying Fraudulent Transactions and Illicit Nodes in the Bitcoin Network for Financial Forensics.\n",
        "\n",
        "---\n",
        "\n"
      ],
      "metadata": {
        "id": "O34u-DVsX4jx"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "## [SETUP] Import libraries and csv files "
      ],
      "metadata": {
        "id": "ReHrhaPiaiI-"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "Download dataset from: [https://www.github.com/git-disl/EllipticPlusPlus](https://www.github.com/git-disl/EllipticPlusPlus)"
      ],
      "metadata": {
        "id": "TLi0Zc7j6Rb6"
      }
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "eUbJT_J-A1Mw",
        "outputId": "6e64f92e-3bac-4dbc-b4f2-d57489140473"
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Mounted at /content/drive\n"
          ]
        }
      ],
      "source": [
        "from google.colab import drive\n",
        "drive.mount('/content/drive')\n",
        "!cp drive/My\\ Drive/Elliptic++\\ Dataset/txs_features.csv ./\n",
        "!cp drive/My\\ Drive/Elliptic++\\ Dataset/txs_classes.csv ./\n",
        "!cp drive/My\\ Drive/Elliptic++\\ Dataset/txs_edgelist.csv ./"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "kz7WtWhG6MtI",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "ba02acc2-6f29-4d71-c5df-2d6bc5c61453"
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/\n",
            "Requirement already satisfied: ipython in /usr/local/lib/python3.8/dist-packages (8.9.0)\n",
            "Requirement already satisfied: decorator in /usr/local/lib/python3.8/dist-packages (from ipython) (4.4.2)\n",
            "Requirement already satisfied: pexpect>4.3 in /usr/local/lib/python3.8/dist-packages (from ipython) (4.8.0)\n",
            "Requirement already satisfied: matplotlib-inline in /usr/local/lib/python3.8/dist-packages (from ipython) (0.1.6)\n",
            "Requirement already satisfied: stack-data in /usr/local/lib/python3.8/dist-packages (from ipython) (0.6.2)\n",
            "Requirement already satisfied: jedi>=0.16 in /usr/local/lib/python3.8/dist-packages (from ipython) (0.18.2)\n",
            "Requirement already satisfied: traitlets>=5 in /usr/local/lib/python3.8/dist-packages (from ipython) (5.7.1)\n",
            "Requirement already satisfied: pickleshare in /usr/local/lib/python3.8/dist-packages (from ipython) (0.7.5)\n",
            "Requirement already satisfied: prompt-toolkit<3.1.0,>=3.0.30 in /usr/local/lib/python3.8/dist-packages (from ipython) (3.0.36)\n",
            "Requirement already satisfied: backcall in /usr/local/lib/python3.8/dist-packages (from ipython) (0.2.0)\n",
            "Requirement already satisfied: pygments>=2.4.0 in /usr/local/lib/python3.8/dist-packages (from ipython) (2.6.1)\n",
            "Requirement already satisfied: parso<0.9.0,>=0.8.0 in /usr/local/lib/python3.8/dist-packages (from jedi>=0.16->ipython) (0.8.3)\n",
            "Requirement already satisfied: ptyprocess>=0.5 in /usr/local/lib/python3.8/dist-packages (from pexpect>4.3->ipython) (0.7.0)\n",
            "Requirement already satisfied: wcwidth in /usr/local/lib/python3.8/dist-packages (from prompt-toolkit<3.1.0,>=3.0.30->ipython) (0.2.5)\n",
            "Requirement already satisfied: executing>=1.2.0 in /usr/local/lib/python3.8/dist-packages (from stack-data->ipython) (1.2.0)\n",
            "Requirement already satisfied: asttokens>=2.1.0 in /usr/local/lib/python3.8/dist-packages (from stack-data->ipython) (2.2.1)\n",
            "Requirement already satisfied: pure-eval in /usr/local/lib/python3.8/dist-packages (from stack-data->ipython) (0.2.2)\n",
            "Requirement already satisfied: six in /usr/local/lib/python3.8/dist-packages (from asttokens>=2.1.0->stack-data->ipython) (1.15.0)\n"
          ]
        }
      ],
      "source": [
        "import numpy as np\n",
        "import pandas as pd\n",
        "import matplotlib.pyplot as plt\n",
        "import seaborn as sns\n",
        "import networkx as nx\n",
        "import plotly.graph_objs as go \n",
        "import plotly.offline as py \n",
        "import math\n",
        "\n",
        "!pip install -U ipython \n",
        "from IPython.core.interactiveshell import InteractiveShell\n",
        "InteractiveShell.ast_node_interactivity = 'all'"
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "from sklearn.model_selection import train_test_split\n",
        "from sklearn.ensemble import RandomForestClassifier\n",
        "from sklearn.metrics import precision_recall_fscore_support\n",
        "from sklearn.model_selection import train_test_split\n",
        "from sklearn.linear_model import LogisticRegression\n",
        "from sklearn.neural_network import MLPClassifier\n",
        "from sklearn.metrics import f1_score, accuracy_score, confusion_matrix\n",
        "from sklearn.cluster import KMeans\n",
        "from sklearn.model_selection import GridSearchCV\n",
        "from sklearn.preprocessing import MinMaxScaler\n",
        "from sklearn.ensemble import VotingClassifier\n",
        "from sklearn.base import clone \n",
        "\n",
        "import xgboost as xgb"
      ],
      "metadata": {
        "id": "TKJFAkVLp34j"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "!pip install eli5\n",
        "import eli5\n",
        "from eli5.sklearn import PermutationImportance"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "bRW1hh3S4pbS",
        "outputId": "af64ff37-eb12-4dd8-faba-628b9b695aec"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/\n",
            "Collecting eli5\n",
            "  Downloading eli5-0.13.0.tar.gz (216 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m216.2/216.2 KB\u001b[0m \u001b[31m15.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25h  Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
            "Requirement already satisfied: attrs>17.1.0 in /usr/local/lib/python3.8/dist-packages (from eli5) (22.2.0)\n",
            "Collecting jinja2>=3.0.0\n",
            "  Downloading Jinja2-3.1.2-py3-none-any.whl (133 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m133.1/133.1 KB\u001b[0m \u001b[31m15.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hRequirement already satisfied: numpy>=1.9.0 in /usr/local/lib/python3.8/dist-packages (from eli5) (1.21.6)\n",
            "Requirement already satisfied: scipy in /usr/local/lib/python3.8/dist-packages (from eli5) (1.7.3)\n",
            "Requirement already satisfied: six in /usr/local/lib/python3.8/dist-packages (from eli5) (1.15.0)\n",
            "Requirement already satisfied: scikit-learn>=0.20 in /usr/local/lib/python3.8/dist-packages (from eli5) (1.0.2)\n",
            "Requirement already satisfied: graphviz in /usr/local/lib/python3.8/dist-packages (from eli5) (0.10.1)\n",
            "Requirement already satisfied: tabulate>=0.7.7 in /usr/local/lib/python3.8/dist-packages (from eli5) (0.8.10)\n",
            "Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.8/dist-packages (from jinja2>=3.0.0->eli5) (2.0.1)\n",
            "Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.8/dist-packages (from scikit-learn>=0.20->eli5) (3.1.0)\n",
            "Requirement already satisfied: joblib>=0.11 in /usr/local/lib/python3.8/dist-packages (from scikit-learn>=0.20->eli5) (1.2.0)\n",
            "Building wheels for collected packages: eli5\n",
            "  Building wheel for eli5 (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
            "  Created wheel for eli5: filename=eli5-0.13.0-py2.py3-none-any.whl size=107748 sha256=138c7b7afc731dc3a39e6bd4e82b7a9fa2f965be699cbe7c19820bc09cf8bfa4\n",
            "  Stored in directory: /root/.cache/pip/wheels/85/ac/25/ffcd87ef8f9b1eec324fdf339359be71f22612459d8c75d89c\n",
            "Successfully built eli5\n",
            "Installing collected packages: jinja2, eli5\n",
            "  Attempting uninstall: jinja2\n",
            "    Found existing installation: Jinja2 2.11.3\n",
            "    Uninstalling Jinja2-2.11.3:\n",
            "      Successfully uninstalled Jinja2-2.11.3\n",
            "\u001b[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n",
            "notebook 5.7.16 requires jinja2<=3.0.0, but you have jinja2 3.1.2 which is incompatible.\n",
            "google-colab 1.0.0 requires ipython~=7.9.0, but you have ipython 8.9.0 which is incompatible.\n",
            "flask 1.1.4 requires Jinja2<3.0,>=2.10.1, but you have jinja2 3.1.2 which is incompatible.\u001b[0m\u001b[31m\n",
            "\u001b[0mSuccessfully installed eli5-0.13.0 jinja2-3.1.2\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Transactions Dataset Overview\n",
        "\n",
        "\n",
        "---\n",
        "\n",
        "This section loads the 3 csv files (txs_features, txs_classes, txs_edgelist) and provides a quick overview of the dataset structure and features."
      ],
      "metadata": {
        "id": "y3JLmL3SfJqP"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "Load saved transactions dataset csv files:"
      ],
      "metadata": {
        "id": "ZcdjXmV8gr8S"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "print(\"\\nTransaction features: \\n\")\n",
        "df_txs_features = pd.read_csv(\"txs_features.csv\")\n",
        "df_txs_features\n",
        "\n",
        "print(\"\\nTransaction classes: \\n\")\n",
        "df_txs_classes = pd.read_csv(\"txs_classes.csv\")\n",
        "df_txs_classes\n",
        "\n",
        "print(\"\\nTransaction-Transaction edgelist: \\n\")\n",
        "df_txs_edgelist = pd.read_csv(\"txs_edgelist.csv\")\n",
        "df_txs_edgelist"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 1000
        },
        "id": "dNNEwGmae2Eo",
        "outputId": "3ed76d40-095b-42dc-8362-5eb54958b88a"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "\n",
            "Transaction features: \n",
            "\n"
          ]
        },
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "             txId  Time step  Local_feature_1  Local_feature_2  \\\n",
              "0            3321          1        -0.169615        -0.184668   \n",
              "1           11108          1        -0.137586        -0.184668   \n",
              "2           51816          1        -0.170103        -0.184668   \n",
              "3           68869          1        -0.114267        -0.184668   \n",
              "4           89273          1         5.202107        -0.210553   \n",
              "...           ...        ...              ...              ...   \n",
              "203764  158304003         49        -0.165622        -0.139563   \n",
              "203765  158303998         49        -0.167040        -0.139563   \n",
              "203766  158303966         49        -0.167040        -0.139563   \n",
              "203767  161526077         49        -0.172212        -0.139573   \n",
              "203768  194103537         49        -0.172212        -0.139573   \n",
              "\n",
              "        Local_feature_3  Local_feature_4  Local_feature_5  Local_feature_6  \\\n",
              "0             -1.201369        -0.121970        -0.043875        -0.113002   \n",
              "1             -1.201369        -0.121970        -0.043875        -0.113002   \n",
              "2             -1.201369        -0.121970        -0.043875        -0.113002   \n",
              "3             -1.201369         0.028105        -0.043875        -0.113002   \n",
              "4             -1.756361        -0.121970       260.090707        -0.113002   \n",
              "...                 ...              ...              ...              ...   \n",
              "203764         1.018602        -0.121970        -0.043875        -0.113002   \n",
              "203765         1.018602        -0.121970        -0.043875        -0.113002   \n",
              "203766         1.018602        -0.121970        -0.043875        -0.113002   \n",
              "203767         1.018602        -0.121970        -0.043875        -0.113002   \n",
              "203768         1.018602        -0.121970        -0.043875        -0.113002   \n",
              "\n",
              "        Local_feature_7  Local_feature_8  ...  in_BTC_min  in_BTC_max  \\\n",
              "0             -0.061584        -0.160199  ...    0.534072    0.534072   \n",
              "1             -0.061584        -0.127429  ...    5.611878    5.611878   \n",
              "2             -0.061584        -0.160699  ...    0.456608    0.456608   \n",
              "3              0.547008        -0.161652  ...    0.308900    8.000000   \n",
              "4             -0.061584         5.335864  ...  852.164680  852.164680   \n",
              "...                 ...              ...  ...         ...         ...   \n",
              "203764        -0.061584        -0.156113  ...         NaN         NaN   \n",
              "203765        -0.061584        -0.157564  ...         NaN         NaN   \n",
              "203766        -0.061584        -0.157564  ...         NaN         NaN   \n",
              "203767        -0.061584        -0.162856  ...         NaN         NaN   \n",
              "203768        -0.061584        -0.162856  ...         NaN         NaN   \n",
              "\n",
              "        in_BTC_mean  in_BTC_median  in_BTC_total   out_BTC_min  out_BTC_max  \\\n",
              "0          0.534072       0.534072      0.534072  1.668990e-01     0.367074   \n",
              "1          5.611878       5.611878      5.611878  5.861940e-01     5.025584   \n",
              "2          0.456608       0.456608      0.456608  2.279902e-01     0.228518   \n",
              "3          3.102967       1.000000      9.308900  1.229000e+00     8.079800   \n",
              "4        852.164680     852.164680    852.164680  1.300000e-07    41.264036   \n",
              "...             ...            ...           ...           ...          ...   \n",
              "203764          NaN            NaN           NaN           NaN          NaN   \n",
              "203765          NaN            NaN           NaN           NaN          NaN   \n",
              "203766          NaN            NaN           NaN           NaN          NaN   \n",
              "203767          NaN            NaN           NaN           NaN          NaN   \n",
              "203768          NaN            NaN           NaN           NaN          NaN   \n",
              "\n",
              "        out_BTC_mean  out_BTC_median  out_BTC_total  \n",
              "0           0.266986        0.266986       0.533972  \n",
              "1           2.805889        2.805889       5.611778  \n",
              "2           0.228254        0.228254       0.456508  \n",
              "3           4.654400        4.654400       9.308800  \n",
              "4           0.065016        0.000441     852.164680  \n",
              "...              ...             ...            ...  \n",
              "203764           NaN             NaN            NaN  \n",
              "203765           NaN             NaN            NaN  \n",
              "203766           NaN             NaN            NaN  \n",
              "203767           NaN             NaN            NaN  \n",
              "203768           NaN             NaN            NaN  \n",
              "\n",
              "[203769 rows x 184 columns]"
            ],
            "text/html": [
              "\n",
              "  <div id=\"df-c00063ae-35bb-4adf-8b2f-f7a6d2444a13\">\n",
              "    <div class=\"colab-df-container\">\n",
              "      <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>txId</th>\n",
              "      <th>Time step</th>\n",
              "      <th>Local_feature_1</th>\n",
              "      <th>Local_feature_2</th>\n",
              "      <th>Local_feature_3</th>\n",
              "      <th>Local_feature_4</th>\n",
              "      <th>Local_feature_5</th>\n",
              "      <th>Local_feature_6</th>\n",
              "      <th>Local_feature_7</th>\n",
              "      <th>Local_feature_8</th>\n",
              "      <th>...</th>\n",
              "      <th>in_BTC_min</th>\n",
              "      <th>in_BTC_max</th>\n",
              "      <th>in_BTC_mean</th>\n",
              "      <th>in_BTC_median</th>\n",
              "      <th>in_BTC_total</th>\n",
              "      <th>out_BTC_min</th>\n",
              "      <th>out_BTC_max</th>\n",
              "      <th>out_BTC_mean</th>\n",
              "      <th>out_BTC_median</th>\n",
              "      <th>out_BTC_total</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>3321</td>\n",
              "      <td>1</td>\n",
              "      <td>-0.169615</td>\n",
              "      <td>-0.184668</td>\n",
              "      <td>-1.201369</td>\n",
              "      <td>-0.121970</td>\n",
              "      <td>-0.043875</td>\n",
              "      <td>-0.113002</td>\n",
              "      <td>-0.061584</td>\n",
              "      <td>-0.160199</td>\n",
              "      <td>...</td>\n",
              "      <td>0.534072</td>\n",
              "      <td>0.534072</td>\n",
              "      <td>0.534072</td>\n",
              "      <td>0.534072</td>\n",
              "      <td>0.534072</td>\n",
              "      <td>1.668990e-01</td>\n",
              "      <td>0.367074</td>\n",
              "      <td>0.266986</td>\n",
              "      <td>0.266986</td>\n",
              "      <td>0.533972</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>1</th>\n",
              "      <td>11108</td>\n",
              "      <td>1</td>\n",
              "      <td>-0.137586</td>\n",
              "      <td>-0.184668</td>\n",
              "      <td>-1.201369</td>\n",
              "      <td>-0.121970</td>\n",
              "      <td>-0.043875</td>\n",
              "      <td>-0.113002</td>\n",
              "      <td>-0.061584</td>\n",
              "      <td>-0.127429</td>\n",
              "      <td>...</td>\n",
              "      <td>5.611878</td>\n",
              "      <td>5.611878</td>\n",
              "      <td>5.611878</td>\n",
              "      <td>5.611878</td>\n",
              "      <td>5.611878</td>\n",
              "      <td>5.861940e-01</td>\n",
              "      <td>5.025584</td>\n",
              "      <td>2.805889</td>\n",
              "      <td>2.805889</td>\n",
              "      <td>5.611778</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2</th>\n",
              "      <td>51816</td>\n",
              "      <td>1</td>\n",
              "      <td>-0.170103</td>\n",
              "      <td>-0.184668</td>\n",
              "      <td>-1.201369</td>\n",
              "      <td>-0.121970</td>\n",
              "      <td>-0.043875</td>\n",
              "      <td>-0.113002</td>\n",
              "      <td>-0.061584</td>\n",
              "      <td>-0.160699</td>\n",
              "      <td>...</td>\n",
              "      <td>0.456608</td>\n",
              "      <td>0.456608</td>\n",
              "      <td>0.456608</td>\n",
              "      <td>0.456608</td>\n",
              "      <td>0.456608</td>\n",
              "      <td>2.279902e-01</td>\n",
              "      <td>0.228518</td>\n",
              "      <td>0.228254</td>\n",
              "      <td>0.228254</td>\n",
              "      <td>0.456508</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>3</th>\n",
              "      <td>68869</td>\n",
              "      <td>1</td>\n",
              "      <td>-0.114267</td>\n",
              "      <td>-0.184668</td>\n",
              "      <td>-1.201369</td>\n",
              "      <td>0.028105</td>\n",
              "      <td>-0.043875</td>\n",
              "      <td>-0.113002</td>\n",
              "      <td>0.547008</td>\n",
              "      <td>-0.161652</td>\n",
              "      <td>...</td>\n",
              "      <td>0.308900</td>\n",
              "      <td>8.000000</td>\n",
              "      <td>3.102967</td>\n",
              "      <td>1.000000</td>\n",
              "      <td>9.308900</td>\n",
              "      <td>1.229000e+00</td>\n",
              "      <td>8.079800</td>\n",
              "      <td>4.654400</td>\n",
              "      <td>4.654400</td>\n",
              "      <td>9.308800</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>4</th>\n",
              "      <td>89273</td>\n",
              "      <td>1</td>\n",
              "      <td>5.202107</td>\n",
              "      <td>-0.210553</td>\n",
              "      <td>-1.756361</td>\n",
              "      <td>-0.121970</td>\n",
              "      <td>260.090707</td>\n",
              "      <td>-0.113002</td>\n",
              "      <td>-0.061584</td>\n",
              "      <td>5.335864</td>\n",
              "      <td>...</td>\n",
              "      <td>852.164680</td>\n",
              "      <td>852.164680</td>\n",
              "      <td>852.164680</td>\n",
              "      <td>852.164680</td>\n",
              "      <td>852.164680</td>\n",
              "      <td>1.300000e-07</td>\n",
              "      <td>41.264036</td>\n",
              "      <td>0.065016</td>\n",
              "      <td>0.000441</td>\n",
              "      <td>852.164680</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>...</th>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>203764</th>\n",
              "      <td>158304003</td>\n",
              "      <td>49</td>\n",
              "      <td>-0.165622</td>\n",
              "      <td>-0.139563</td>\n",
              "      <td>1.018602</td>\n",
              "      <td>-0.121970</td>\n",
              "      <td>-0.043875</td>\n",
              "      <td>-0.113002</td>\n",
              "      <td>-0.061584</td>\n",
              "      <td>-0.156113</td>\n",
              "      <td>...</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>203765</th>\n",
              "      <td>158303998</td>\n",
              "      <td>49</td>\n",
              "      <td>-0.167040</td>\n",
              "      <td>-0.139563</td>\n",
              "      <td>1.018602</td>\n",
              "      <td>-0.121970</td>\n",
              "      <td>-0.043875</td>\n",
              "      <td>-0.113002</td>\n",
              "      <td>-0.061584</td>\n",
              "      <td>-0.157564</td>\n",
              "      <td>...</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>203766</th>\n",
              "      <td>158303966</td>\n",
              "      <td>49</td>\n",
              "      <td>-0.167040</td>\n",
              "      <td>-0.139563</td>\n",
              "      <td>1.018602</td>\n",
              "      <td>-0.121970</td>\n",
              "      <td>-0.043875</td>\n",
              "      <td>-0.113002</td>\n",
              "      <td>-0.061584</td>\n",
              "      <td>-0.157564</td>\n",
              "      <td>...</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>203767</th>\n",
              "      <td>161526077</td>\n",
              "      <td>49</td>\n",
              "      <td>-0.172212</td>\n",
              "      <td>-0.139573</td>\n",
              "      <td>1.018602</td>\n",
              "      <td>-0.121970</td>\n",
              "      <td>-0.043875</td>\n",
              "      <td>-0.113002</td>\n",
              "      <td>-0.061584</td>\n",
              "      <td>-0.162856</td>\n",
              "      <td>...</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>203768</th>\n",
              "      <td>194103537</td>\n",
              "      <td>49</td>\n",
              "      <td>-0.172212</td>\n",
              "      <td>-0.139573</td>\n",
              "      <td>1.018602</td>\n",
              "      <td>-0.121970</td>\n",
              "      <td>-0.043875</td>\n",
              "      <td>-0.113002</td>\n",
              "      <td>-0.061584</td>\n",
              "      <td>-0.162856</td>\n",
              "      <td>...</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "<p>203769 rows × 184 columns</p>\n",
              "</div>\n",
              "      <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-c00063ae-35bb-4adf-8b2f-f7a6d2444a13')\"\n",
              "              title=\"Convert this dataframe to an interactive table.\"\n",
              "              style=\"display:none;\">\n",
              "        \n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "       width=\"24px\">\n",
              "    <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n",
              "    <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n",
              "  </svg>\n",
              "      </button>\n",
              "      \n",
              "  <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      flex-wrap:wrap;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "      <script>\n",
              "        const buttonEl =\n",
              "          document.querySelector('#df-c00063ae-35bb-4adf-8b2f-f7a6d2444a13 button.colab-df-convert');\n",
              "        buttonEl.style.display =\n",
              "          google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "        async function convertToInteractive(key) {\n",
              "          const element = document.querySelector('#df-c00063ae-35bb-4adf-8b2f-f7a6d2444a13');\n",
              "          const dataTable =\n",
              "            await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                     [key], {});\n",
              "          if (!dataTable) return;\n",
              "\n",
              "          const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "            '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "            + ' to learn more about interactive tables.';\n",
              "          element.innerHTML = '';\n",
              "          dataTable['output_type'] = 'display_data';\n",
              "          await google.colab.output.renderOutput(dataTable, element);\n",
              "          const docLink = document.createElement('div');\n",
              "          docLink.innerHTML = docLinkHtml;\n",
              "          element.appendChild(docLink);\n",
              "        }\n",
              "      </script>\n",
              "    </div>\n",
              "  </div>\n",
              "  "
            ]
          },
          "metadata": {},
          "execution_count": 4
        },
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "\n",
            "Transaction classes: \n",
            "\n"
          ]
        },
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "             txId  class\n",
              "0            3321      3\n",
              "1           11108      3\n",
              "2           51816      3\n",
              "3           68869      2\n",
              "4           89273      2\n",
              "...           ...    ...\n",
              "203764  158304003      3\n",
              "203765  158303998      3\n",
              "203766  158303966      3\n",
              "203767  161526077      3\n",
              "203768  194103537      3\n",
              "\n",
              "[203769 rows x 2 columns]"
            ],
            "text/html": [
              "\n",
              "  <div id=\"df-7125e9e5-264a-400b-8350-964e714cea46\">\n",
              "    <div class=\"colab-df-container\">\n",
              "      <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>txId</th>\n",
              "      <th>class</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>3321</td>\n",
              "      <td>3</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>1</th>\n",
              "      <td>11108</td>\n",
              "      <td>3</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2</th>\n",
              "      <td>51816</td>\n",
              "      <td>3</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>3</th>\n",
              "      <td>68869</td>\n",
              "      <td>2</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>4</th>\n",
              "      <td>89273</td>\n",
              "      <td>2</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>...</th>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>203764</th>\n",
              "      <td>158304003</td>\n",
              "      <td>3</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>203765</th>\n",
              "      <td>158303998</td>\n",
              "      <td>3</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>203766</th>\n",
              "      <td>158303966</td>\n",
              "      <td>3</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>203767</th>\n",
              "      <td>161526077</td>\n",
              "      <td>3</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>203768</th>\n",
              "      <td>194103537</td>\n",
              "      <td>3</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "<p>203769 rows × 2 columns</p>\n",
              "</div>\n",
              "      <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-7125e9e5-264a-400b-8350-964e714cea46')\"\n",
              "              title=\"Convert this dataframe to an interactive table.\"\n",
              "              style=\"display:none;\">\n",
              "        \n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "       width=\"24px\">\n",
              "    <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n",
              "    <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n",
              "  </svg>\n",
              "      </button>\n",
              "      \n",
              "  <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      flex-wrap:wrap;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "      <script>\n",
              "        const buttonEl =\n",
              "          document.querySelector('#df-7125e9e5-264a-400b-8350-964e714cea46 button.colab-df-convert');\n",
              "        buttonEl.style.display =\n",
              "          google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "        async function convertToInteractive(key) {\n",
              "          const element = document.querySelector('#df-7125e9e5-264a-400b-8350-964e714cea46');\n",
              "          const dataTable =\n",
              "            await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                     [key], {});\n",
              "          if (!dataTable) return;\n",
              "\n",
              "          const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "            '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "            + ' to learn more about interactive tables.';\n",
              "          element.innerHTML = '';\n",
              "          dataTable['output_type'] = 'display_data';\n",
              "          await google.colab.output.renderOutput(dataTable, element);\n",
              "          const docLink = document.createElement('div');\n",
              "          docLink.innerHTML = docLinkHtml;\n",
              "          element.appendChild(docLink);\n",
              "        }\n",
              "      </script>\n",
              "    </div>\n",
              "  </div>\n",
              "  "
            ]
          },
          "metadata": {},
          "execution_count": 4
        },
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "\n",
            "Transaction-Transaction edgelist: \n",
            "\n"
          ]
        },
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "            txId1      txId2\n",
              "0       230425980    5530458\n",
              "1       232022460  232438397\n",
              "2       230460314  230459870\n",
              "3       230333930  230595899\n",
              "4       232013274  232029206\n",
              "...           ...        ...\n",
              "234350  158365409  157930723\n",
              "234351  188708874  188708879\n",
              "234352  157659064  157659046\n",
              "234353   87414554  106877725\n",
              "234354  158589452  158589457\n",
              "\n",
              "[234355 rows x 2 columns]"
            ],
            "text/html": [
              "\n",
              "  <div id=\"df-d1f1b9a8-013d-41b2-9b4c-e8badcdcad74\">\n",
              "    <div class=\"colab-df-container\">\n",
              "      <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>txId1</th>\n",
              "      <th>txId2</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>230425980</td>\n",
              "      <td>5530458</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>1</th>\n",
              "      <td>232022460</td>\n",
              "      <td>232438397</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2</th>\n",
              "      <td>230460314</td>\n",
              "      <td>230459870</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>3</th>\n",
              "      <td>230333930</td>\n",
              "      <td>230595899</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>4</th>\n",
              "      <td>232013274</td>\n",
              "      <td>232029206</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>...</th>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>234350</th>\n",
              "      <td>158365409</td>\n",
              "      <td>157930723</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>234351</th>\n",
              "      <td>188708874</td>\n",
              "      <td>188708879</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>234352</th>\n",
              "      <td>157659064</td>\n",
              "      <td>157659046</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>234353</th>\n",
              "      <td>87414554</td>\n",
              "      <td>106877725</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>234354</th>\n",
              "      <td>158589452</td>\n",
              "      <td>158589457</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "<p>234355 rows × 2 columns</p>\n",
              "</div>\n",
              "      <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-d1f1b9a8-013d-41b2-9b4c-e8badcdcad74')\"\n",
              "              title=\"Convert this dataframe to an interactive table.\"\n",
              "              style=\"display:none;\">\n",
              "        \n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "       width=\"24px\">\n",
              "    <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n",
              "    <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n",
              "  </svg>\n",
              "      </button>\n",
              "      \n",
              "  <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      flex-wrap:wrap;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "      <script>\n",
              "        const buttonEl =\n",
              "          document.querySelector('#df-d1f1b9a8-013d-41b2-9b4c-e8badcdcad74 button.colab-df-convert');\n",
              "        buttonEl.style.display =\n",
              "          google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "        async function convertToInteractive(key) {\n",
              "          const element = document.querySelector('#df-d1f1b9a8-013d-41b2-9b4c-e8badcdcad74');\n",
              "          const dataTable =\n",
              "            await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                     [key], {});\n",
              "          if (!dataTable) return;\n",
              "\n",
              "          const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "            '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "            + ' to learn more about interactive tables.';\n",
              "          element.innerHTML = '';\n",
              "          dataTable['output_type'] = 'display_data';\n",
              "          await google.colab.output.renderOutput(dataTable, element);\n",
              "          const docLink = document.createElement('div');\n",
              "          docLink.innerHTML = docLinkHtml;\n",
              "          element.appendChild(docLink);\n",
              "        }\n",
              "      </script>\n",
              "    </div>\n",
              "  </div>\n",
              "  "
            ]
          },
          "metadata": {},
          "execution_count": 4
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "Data structure for an example transaction (txId = 272145560):"
      ],
      "metadata": {
        "id": "5Qw43a6xe9rN"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "print(\"\\ntxs_features.csv for txId = 272145560\\n\")\n",
        "df_txs_features[df_txs_features['txId']==272145560]\n",
        "\n",
        "print(\"\\ntxs_classes.csv for txId = 272145560\\n\")\n",
        "df_txs_classes[df_txs_classes['txId']==272145560]\n",
        "\n",
        "print(\"\\ntxs_edgelist.csv for txId = 272145560\\n\")\n",
        "df_txs_edgelist[(df_txs_edgelist['txId1']==272145560) | (df_txs_edgelist['txId2']==272145560)]"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 543
        },
        "id": "BHp9b7S1e1-F",
        "outputId": "757d6c2d-3c51-49f6-d0eb-2d701b52fc78"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "\n",
            "txs_features.csv for txId=272145560\n",
            "\n"
          ]
        },
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "             txId  Time step  Local_feature_1  Local_feature_2  \\\n",
              "105573  272145560         24        -0.155493        -0.107012   \n",
              "\n",
              "        Local_feature_3  Local_feature_4  Local_feature_5  Local_feature_6  \\\n",
              "105573        -1.201369         -0.12197        -0.043875        -0.113002   \n",
              "\n",
              "        Local_feature_7  Local_feature_8  ...  in_BTC_min  in_BTC_max  \\\n",
              "105573        -0.061584        -0.145749  ...      2.7732      2.7732   \n",
              "\n",
              "        in_BTC_mean  in_BTC_median  in_BTC_total  out_BTC_min  out_BTC_max  \\\n",
              "105573       2.7732         2.7732        2.7732     0.001917     2.770883   \n",
              "\n",
              "        out_BTC_mean  out_BTC_median  out_BTC_total  \n",
              "105573        1.3864          1.3864         2.7728  \n",
              "\n",
              "[1 rows x 184 columns]"
            ],
            "text/html": [
              "\n",
              "  <div id=\"df-d4c2c7c9-5a47-4c7b-88df-ca4a880bf464\">\n",
              "    <div class=\"colab-df-container\">\n",
              "      <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>txId</th>\n",
              "      <th>Time step</th>\n",
              "      <th>Local_feature_1</th>\n",
              "      <th>Local_feature_2</th>\n",
              "      <th>Local_feature_3</th>\n",
              "      <th>Local_feature_4</th>\n",
              "      <th>Local_feature_5</th>\n",
              "      <th>Local_feature_6</th>\n",
              "      <th>Local_feature_7</th>\n",
              "      <th>Local_feature_8</th>\n",
              "      <th>...</th>\n",
              "      <th>in_BTC_min</th>\n",
              "      <th>in_BTC_max</th>\n",
              "      <th>in_BTC_mean</th>\n",
              "      <th>in_BTC_median</th>\n",
              "      <th>in_BTC_total</th>\n",
              "      <th>out_BTC_min</th>\n",
              "      <th>out_BTC_max</th>\n",
              "      <th>out_BTC_mean</th>\n",
              "      <th>out_BTC_median</th>\n",
              "      <th>out_BTC_total</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>105573</th>\n",
              "      <td>272145560</td>\n",
              "      <td>24</td>\n",
              "      <td>-0.155493</td>\n",
              "      <td>-0.107012</td>\n",
              "      <td>-1.201369</td>\n",
              "      <td>-0.12197</td>\n",
              "      <td>-0.043875</td>\n",
              "      <td>-0.113002</td>\n",
              "      <td>-0.061584</td>\n",
              "      <td>-0.145749</td>\n",
              "      <td>...</td>\n",
              "      <td>2.7732</td>\n",
              "      <td>2.7732</td>\n",
              "      <td>2.7732</td>\n",
              "      <td>2.7732</td>\n",
              "      <td>2.7732</td>\n",
              "      <td>0.001917</td>\n",
              "      <td>2.770883</td>\n",
              "      <td>1.3864</td>\n",
              "      <td>1.3864</td>\n",
              "      <td>2.7728</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "<p>1 rows × 184 columns</p>\n",
              "</div>\n",
              "      <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-d4c2c7c9-5a47-4c7b-88df-ca4a880bf464')\"\n",
              "              title=\"Convert this dataframe to an interactive table.\"\n",
              "              style=\"display:none;\">\n",
              "        \n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "       width=\"24px\">\n",
              "    <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n",
              "    <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n",
              "  </svg>\n",
              "      </button>\n",
              "      \n",
              "  <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      flex-wrap:wrap;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "      <script>\n",
              "        const buttonEl =\n",
              "          document.querySelector('#df-d4c2c7c9-5a47-4c7b-88df-ca4a880bf464 button.colab-df-convert');\n",
              "        buttonEl.style.display =\n",
              "          google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "        async function convertToInteractive(key) {\n",
              "          const element = document.querySelector('#df-d4c2c7c9-5a47-4c7b-88df-ca4a880bf464');\n",
              "          const dataTable =\n",
              "            await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                     [key], {});\n",
              "          if (!dataTable) return;\n",
              "\n",
              "          const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "            '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "            + ' to learn more about interactive tables.';\n",
              "          element.innerHTML = '';\n",
              "          dataTable['output_type'] = 'display_data';\n",
              "          await google.colab.output.renderOutput(dataTable, element);\n",
              "          const docLink = document.createElement('div');\n",
              "          docLink.innerHTML = docLinkHtml;\n",
              "          element.appendChild(docLink);\n",
              "        }\n",
              "      </script>\n",
              "    </div>\n",
              "  </div>\n",
              "  "
            ]
          },
          "metadata": {},
          "execution_count": 5
        },
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "\n",
            "txs_classes.csv for txId=272145560\n",
            "\n"
          ]
        },
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "             txId  class\n",
              "105573  272145560      1"
            ],
            "text/html": [
              "\n",
              "  <div id=\"df-2cb80618-7e36-43e1-bc3a-7916574c7b0d\">\n",
              "    <div class=\"colab-df-container\">\n",
              "      <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>txId</th>\n",
              "      <th>class</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>105573</th>\n",
              "      <td>272145560</td>\n",
              "      <td>1</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>\n",
              "      <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-2cb80618-7e36-43e1-bc3a-7916574c7b0d')\"\n",
              "              title=\"Convert this dataframe to an interactive table.\"\n",
              "              style=\"display:none;\">\n",
              "        \n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "       width=\"24px\">\n",
              "    <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n",
              "    <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n",
              "  </svg>\n",
              "      </button>\n",
              "      \n",
              "  <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      flex-wrap:wrap;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "      <script>\n",
              "        const buttonEl =\n",
              "          document.querySelector('#df-2cb80618-7e36-43e1-bc3a-7916574c7b0d button.colab-df-convert');\n",
              "        buttonEl.style.display =\n",
              "          google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "        async function convertToInteractive(key) {\n",
              "          const element = document.querySelector('#df-2cb80618-7e36-43e1-bc3a-7916574c7b0d');\n",
              "          const dataTable =\n",
              "            await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                     [key], {});\n",
              "          if (!dataTable) return;\n",
              "\n",
              "          const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "            '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "            + ' to learn more about interactive tables.';\n",
              "          element.innerHTML = '';\n",
              "          dataTable['output_type'] = 'display_data';\n",
              "          await google.colab.output.renderOutput(dataTable, element);\n",
              "          const docLink = document.createElement('div');\n",
              "          docLink.innerHTML = docLinkHtml;\n",
              "          element.appendChild(docLink);\n",
              "        }\n",
              "      </script>\n",
              "    </div>\n",
              "  </div>\n",
              "  "
            ]
          },
          "metadata": {},
          "execution_count": 5
        },
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "\n",
            "txs_edgelist.csv for txId=272145560\n",
            "\n"
          ]
        },
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "            txId1      txId2\n",
              "123072  272145560  296926618\n",
              "123272  272145560  272145556\n",
              "125873  299475624  272145560"
            ],
            "text/html": [
              "\n",
              "  <div id=\"df-80dae61b-ee8e-4b72-a243-d2516eb56b84\">\n",
              "    <div class=\"colab-df-container\">\n",
              "      <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>txId1</th>\n",
              "      <th>txId2</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>123072</th>\n",
              "      <td>272145560</td>\n",
              "      <td>296926618</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>123272</th>\n",
              "      <td>272145560</td>\n",
              "      <td>272145556</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>125873</th>\n",
              "      <td>299475624</td>\n",
              "      <td>272145560</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>\n",
              "      <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-80dae61b-ee8e-4b72-a243-d2516eb56b84')\"\n",
              "              title=\"Convert this dataframe to an interactive table.\"\n",
              "              style=\"display:none;\">\n",
              "        \n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "       width=\"24px\">\n",
              "    <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n",
              "    <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n",
              "  </svg>\n",
              "      </button>\n",
              "      \n",
              "  <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      flex-wrap:wrap;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "      <script>\n",
              "        const buttonEl =\n",
              "          document.querySelector('#df-80dae61b-ee8e-4b72-a243-d2516eb56b84 button.colab-df-convert');\n",
              "        buttonEl.style.display =\n",
              "          google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "        async function convertToInteractive(key) {\n",
              "          const element = document.querySelector('#df-80dae61b-ee8e-4b72-a243-d2516eb56b84');\n",
              "          const dataTable =\n",
              "            await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                     [key], {});\n",
              "          if (!dataTable) return;\n",
              "\n",
              "          const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "            '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "            + ' to learn more about interactive tables.';\n",
              "          element.innerHTML = '';\n",
              "          dataTable['output_type'] = 'display_data';\n",
              "          await google.colab.output.renderOutput(dataTable, element);\n",
              "          const docLink = document.createElement('div');\n",
              "          docLink.innerHTML = docLinkHtml;\n",
              "          element.appendChild(docLink);\n",
              "        }\n",
              "      </script>\n",
              "    </div>\n",
              "  </div>\n",
              "  "
            ]
          },
          "metadata": {},
          "execution_count": 5
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "\n",
        "Transaction features --- 94 local features, 72 aggregate features, 17 augmented features:\n"
      ],
      "metadata": {
        "id": "moS6bxoLg1Pk"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "list(df_txs_features.columns)"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "RpljxgT7k49T",
        "outputId": "916b4dda-11d6-4f92-f10d-3d7040f10ea8"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "['txId',\n",
              " 'Time step',\n",
              " 'class',\n",
              " 'Local_feature_1',\n",
              " 'Local_feature_2',\n",
              " 'Local_feature_3',\n",
              " 'Local_feature_4',\n",
              " 'Local_feature_5',\n",
              " 'Local_feature_6',\n",
              " 'Local_feature_7',\n",
              " 'Local_feature_8',\n",
              " 'Local_feature_9',\n",
              " 'Local_feature_10',\n",
              " 'Local_feature_11',\n",
              " 'Local_feature_12',\n",
              " 'Local_feature_13',\n",
              " 'Local_feature_14',\n",
              " 'Local_feature_15',\n",
              " 'Local_feature_16',\n",
              " 'Local_feature_17',\n",
              " 'Local_feature_18',\n",
              " 'Local_feature_19',\n",
              " 'Local_feature_20',\n",
              " 'Local_feature_21',\n",
              " 'Local_feature_22',\n",
              " 'Local_feature_23',\n",
              " 'Local_feature_24',\n",
              " 'Local_feature_25',\n",
              " 'Local_feature_26',\n",
              " 'Local_feature_27',\n",
              " 'Local_feature_28',\n",
              " 'Local_feature_29',\n",
              " 'Local_feature_30',\n",
              " 'Local_feature_31',\n",
              " 'Local_feature_32',\n",
              " 'Local_feature_33',\n",
              " 'Local_feature_34',\n",
              " 'Local_feature_35',\n",
              " 'Local_feature_36',\n",
              " 'Local_feature_37',\n",
              " 'Local_feature_38',\n",
              " 'Local_feature_39',\n",
              " 'Local_feature_40',\n",
              " 'Local_feature_41',\n",
              " 'Local_feature_42',\n",
              " 'Local_feature_43',\n",
              " 'Local_feature_44',\n",
              " 'Local_feature_45',\n",
              " 'Local_feature_46',\n",
              " 'Local_feature_47',\n",
              " 'Local_feature_48',\n",
              " 'Local_feature_49',\n",
              " 'Local_feature_50',\n",
              " 'Local_feature_51',\n",
              " 'Local_feature_52',\n",
              " 'Local_feature_53',\n",
              " 'Local_feature_54',\n",
              " 'Local_feature_55',\n",
              " 'Local_feature_56',\n",
              " 'Local_feature_57',\n",
              " 'Local_feature_58',\n",
              " 'Local_feature_59',\n",
              " 'Local_feature_60',\n",
              " 'Local_feature_61',\n",
              " 'Local_feature_62',\n",
              " 'Local_feature_63',\n",
              " 'Local_feature_64',\n",
              " 'Local_feature_65',\n",
              " 'Local_feature_66',\n",
              " 'Local_feature_67',\n",
              " 'Local_feature_68',\n",
              " 'Local_feature_69',\n",
              " 'Local_feature_70',\n",
              " 'Local_feature_71',\n",
              " 'Local_feature_72',\n",
              " 'Local_feature_73',\n",
              " 'Local_feature_74',\n",
              " 'Local_feature_75',\n",
              " 'Local_feature_76',\n",
              " 'Local_feature_77',\n",
              " 'Local_feature_78',\n",
              " 'Local_feature_79',\n",
              " 'Local_feature_80',\n",
              " 'Local_feature_81',\n",
              " 'Local_feature_82',\n",
              " 'Local_feature_83',\n",
              " 'Local_feature_84',\n",
              " 'Local_feature_85',\n",
              " 'Local_feature_86',\n",
              " 'Local_feature_87',\n",
              " 'Local_feature_88',\n",
              " 'Local_feature_89',\n",
              " 'Local_feature_90',\n",
              " 'Local_feature_91',\n",
              " 'Local_feature_92',\n",
              " 'Local_feature_93',\n",
              " 'Aggregate_feature_1',\n",
              " 'Aggregate_feature_2',\n",
              " 'Aggregate_feature_3',\n",
              " 'Aggregate_feature_4',\n",
              " 'Aggregate_feature_5',\n",
              " 'Aggregate_feature_6',\n",
              " 'Aggregate_feature_7',\n",
              " 'Aggregate_feature_8',\n",
              " 'Aggregate_feature_9',\n",
              " 'Aggregate_feature_10',\n",
              " 'Aggregate_feature_11',\n",
              " 'Aggregate_feature_12',\n",
              " 'Aggregate_feature_13',\n",
              " 'Aggregate_feature_14',\n",
              " 'Aggregate_feature_15',\n",
              " 'Aggregate_feature_16',\n",
              " 'Aggregate_feature_17',\n",
              " 'Aggregate_feature_18',\n",
              " 'Aggregate_feature_19',\n",
              " 'Aggregate_feature_20',\n",
              " 'Aggregate_feature_21',\n",
              " 'Aggregate_feature_22',\n",
              " 'Aggregate_feature_23',\n",
              " 'Aggregate_feature_24',\n",
              " 'Aggregate_feature_25',\n",
              " 'Aggregate_feature_26',\n",
              " 'Aggregate_feature_27',\n",
              " 'Aggregate_feature_28',\n",
              " 'Aggregate_feature_29',\n",
              " 'Aggregate_feature_30',\n",
              " 'Aggregate_feature_31',\n",
              " 'Aggregate_feature_32',\n",
              " 'Aggregate_feature_33',\n",
              " 'Aggregate_feature_34',\n",
              " 'Aggregate_feature_35',\n",
              " 'Aggregate_feature_36',\n",
              " 'Aggregate_feature_37',\n",
              " 'Aggregate_feature_38',\n",
              " 'Aggregate_feature_39',\n",
              " 'Aggregate_feature_40',\n",
              " 'Aggregate_feature_41',\n",
              " 'Aggregate_feature_42',\n",
              " 'Aggregate_feature_43',\n",
              " 'Aggregate_feature_44',\n",
              " 'Aggregate_feature_45',\n",
              " 'Aggregate_feature_46',\n",
              " 'Aggregate_feature_47',\n",
              " 'Aggregate_feature_48',\n",
              " 'Aggregate_feature_49',\n",
              " 'Aggregate_feature_50',\n",
              " 'Aggregate_feature_51',\n",
              " 'Aggregate_feature_52',\n",
              " 'Aggregate_feature_53',\n",
              " 'Aggregate_feature_54',\n",
              " 'Aggregate_feature_55',\n",
              " 'Aggregate_feature_56',\n",
              " 'Aggregate_feature_57',\n",
              " 'Aggregate_feature_58',\n",
              " 'Aggregate_feature_59',\n",
              " 'Aggregate_feature_60',\n",
              " 'Aggregate_feature_61',\n",
              " 'Aggregate_feature_62',\n",
              " 'Aggregate_feature_63',\n",
              " 'Aggregate_feature_64',\n",
              " 'Aggregate_feature_65',\n",
              " 'Aggregate_feature_66',\n",
              " 'Aggregate_feature_67',\n",
              " 'Aggregate_feature_68',\n",
              " 'Aggregate_feature_69',\n",
              " 'Aggregate_feature_70',\n",
              " 'Aggregate_feature_71',\n",
              " 'Aggregate_feature_72',\n",
              " 'in_txs_degree',\n",
              " 'out_txs_degree',\n",
              " 'total_BTC',\n",
              " 'fees',\n",
              " 'size',\n",
              " 'num_input_addresses',\n",
              " 'num_output_addresses',\n",
              " 'in_BTC_min',\n",
              " 'in_BTC_max',\n",
              " 'in_BTC_mean',\n",
              " 'in_BTC_median',\n",
              " 'in_BTC_total',\n",
              " 'out_BTC_min',\n",
              " 'out_BTC_max',\n",
              " 'out_BTC_mean',\n",
              " 'out_BTC_median',\n",
              " 'out_BTC_total']"
            ]
          },
          "metadata": {},
          "execution_count": 21
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [],
      "metadata": {
        "id": "oZWs5_WOezv_"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Machine Learning Model Classification\n",
        "\n",
        "\n",
        "---\n",
        "\n",
        "This section does data preprocessing, creates the training and testing sets, and runs the Logistic Regression, Random Forest, Multilayer Perceptrons, and XGBoost models as well as the ensembles on the dataset.\n"
      ],
      "metadata": {
        "id": "_3cwbmq8oi6-"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "Drop transactions without augmented feature values (0.5% not de-anonymized):"
      ],
      "metadata": {
        "id": "G2Nf0NwxpLxP"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "df_txs_features = df_txs_features.dropna()\n",
        "df_txs_features"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 505
        },
        "id": "Gf_OiI1XnNGg",
        "outputId": "448b3f69-b0ad-4d74-9887-f1d9ef4aba89"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "             txId  Time step  class  Local_feature_1  Local_feature_2  \\\n",
              "0            3321          1      3        -0.169615        -0.184668   \n",
              "1           11108          1      3        -0.137586        -0.184668   \n",
              "2           51816          1      3        -0.170103        -0.184668   \n",
              "3           68869          1      2        -0.114267        -0.184668   \n",
              "4           89273          1      2         5.202107        -0.210553   \n",
              "...           ...        ...    ...              ...              ...   \n",
              "202799  194747812         49      3         0.558398        -0.198956   \n",
              "202800  194747925         49      3         0.547658        -0.198956   \n",
              "202801  194748063         49      3         0.543600        -0.198853   \n",
              "202802  194748070         49      3         0.537760        -0.198853   \n",
              "202803  194835939         49      3        -0.170463        -0.152788   \n",
              "\n",
              "        Local_feature_3  Local_feature_4  Local_feature_5  Local_feature_6  \\\n",
              "0             -1.201369        -0.121970        -0.043875        -0.113002   \n",
              "1             -1.201369        -0.121970        -0.043875        -0.113002   \n",
              "2             -1.201369        -0.121970        -0.043875        -0.113002   \n",
              "3             -1.201369         0.028105        -0.043875        -0.113002   \n",
              "4             -1.756361        -0.121970       260.090707        -0.113002   \n",
              "...                 ...              ...              ...              ...   \n",
              "202799        -0.091383        -0.121970        -0.043875        -0.113002   \n",
              "202800        -0.091383        -0.121970        -0.043875        -0.113002   \n",
              "202801        -0.091383        -0.121970        -0.043875        -0.113002   \n",
              "202802        -0.091383        -0.121970        -0.043875        -0.113002   \n",
              "202803         1.018602        -0.121970        -0.063725        -0.113002   \n",
              "\n",
              "        Local_feature_7  ...  in_BTC_min  in_BTC_max  in_BTC_mean  \\\n",
              "0             -0.061584  ...    0.000047    0.000047     0.000047   \n",
              "1             -0.061584  ...    0.000493    0.000493     0.000493   \n",
              "2             -0.061584  ...    0.000040    0.000040     0.000040   \n",
              "3              0.547008  ...    0.000027    0.000702     0.000272   \n",
              "4             -0.061584  ...    0.074805    0.074805     0.074805   \n",
              "...                 ...  ...         ...         ...          ...   \n",
              "202799        -0.061584  ...    0.010179    0.010179     0.010179   \n",
              "202800        -0.061584  ...    0.010029    0.010029     0.010029   \n",
              "202801        -0.061584  ...    0.009973    0.009973     0.009973   \n",
              "202802        -0.061584  ...    0.009891    0.009891     0.009891   \n",
              "202803        -0.061584  ...    0.000035    0.000035     0.000035   \n",
              "\n",
              "        in_BTC_median  in_BTC_total   out_BTC_min  out_BTC_max  out_BTC_mean  \\\n",
              "0            0.000047      0.000047  8.301504e-05     0.000032      0.000089   \n",
              "1            0.000493      0.000493  2.915711e-04     0.000444      0.000936   \n",
              "2            0.000040      0.000040  1.134016e-04     0.000020      0.000076   \n",
              "3            0.000088      0.000817  6.113009e-04     0.000714      0.001552   \n",
              "4            0.074805      0.074805  6.466160e-11     0.003648      0.000022   \n",
              "...               ...           ...           ...          ...           ...   \n",
              "202799       0.010179      0.010179  8.223464e-04     0.010104      0.019336   \n",
              "202800       0.010029      0.010029  1.012352e-05     0.010098      0.019052   \n",
              "202801       0.009973      0.009973  4.604647e-04     0.009961      0.018945   \n",
              "202802       0.009891      0.009891  1.505606e-04     0.009935      0.018790   \n",
              "202803       0.000035      0.000035  1.987330e-04     0.000035      0.000133   \n",
              "\n",
              "        out_BTC_median  out_BTC_total  \n",
              "0         8.904096e-05       0.000047  \n",
              "1         9.357923e-04       0.000493  \n",
              "2         7.612341e-05       0.000040  \n",
              "3         1.552291e-03       0.000817  \n",
              "4         1.451405e-07       0.074805  \n",
              "...                ...            ...  \n",
              "202799    1.933576e-02       0.010179  \n",
              "202800    1.905181e-02       0.010029  \n",
              "202801    1.894453e-02       0.009973  \n",
              "202802    1.879015e-02       0.009891  \n",
              "202803    1.332511e-04       0.000035  \n",
              "\n",
              "[202804 rows x 185 columns]"
            ],
            "text/html": [
              "\n",
              "  <div id=\"df-bd5e2093-379c-422f-b749-0aa0254e52c6\">\n",
              "    <div class=\"colab-df-container\">\n",
              "      <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>txId</th>\n",
              "      <th>Time step</th>\n",
              "      <th>class</th>\n",
              "      <th>Local_feature_1</th>\n",
              "      <th>Local_feature_2</th>\n",
              "      <th>Local_feature_3</th>\n",
              "      <th>Local_feature_4</th>\n",
              "      <th>Local_feature_5</th>\n",
              "      <th>Local_feature_6</th>\n",
              "      <th>Local_feature_7</th>\n",
              "      <th>...</th>\n",
              "      <th>in_BTC_min</th>\n",
              "      <th>in_BTC_max</th>\n",
              "      <th>in_BTC_mean</th>\n",
              "      <th>in_BTC_median</th>\n",
              "      <th>in_BTC_total</th>\n",
              "      <th>out_BTC_min</th>\n",
              "      <th>out_BTC_max</th>\n",
              "      <th>out_BTC_mean</th>\n",
              "      <th>out_BTC_median</th>\n",
              "      <th>out_BTC_total</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>3321</td>\n",
              "      <td>1</td>\n",
              "      <td>3</td>\n",
              "      <td>-0.169615</td>\n",
              "      <td>-0.184668</td>\n",
              "      <td>-1.201369</td>\n",
              "      <td>-0.121970</td>\n",
              "      <td>-0.043875</td>\n",
              "      <td>-0.113002</td>\n",
              "      <td>-0.061584</td>\n",
              "      <td>...</td>\n",
              "      <td>0.000047</td>\n",
              "      <td>0.000047</td>\n",
              "      <td>0.000047</td>\n",
              "      <td>0.000047</td>\n",
              "      <td>0.000047</td>\n",
              "      <td>8.301504e-05</td>\n",
              "      <td>0.000032</td>\n",
              "      <td>0.000089</td>\n",
              "      <td>8.904096e-05</td>\n",
              "      <td>0.000047</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>1</th>\n",
              "      <td>11108</td>\n",
              "      <td>1</td>\n",
              "      <td>3</td>\n",
              "      <td>-0.137586</td>\n",
              "      <td>-0.184668</td>\n",
              "      <td>-1.201369</td>\n",
              "      <td>-0.121970</td>\n",
              "      <td>-0.043875</td>\n",
              "      <td>-0.113002</td>\n",
              "      <td>-0.061584</td>\n",
              "      <td>...</td>\n",
              "      <td>0.000493</td>\n",
              "      <td>0.000493</td>\n",
              "      <td>0.000493</td>\n",
              "      <td>0.000493</td>\n",
              "      <td>0.000493</td>\n",
              "      <td>2.915711e-04</td>\n",
              "      <td>0.000444</td>\n",
              "      <td>0.000936</td>\n",
              "      <td>9.357923e-04</td>\n",
              "      <td>0.000493</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2</th>\n",
              "      <td>51816</td>\n",
              "      <td>1</td>\n",
              "      <td>3</td>\n",
              "      <td>-0.170103</td>\n",
              "      <td>-0.184668</td>\n",
              "      <td>-1.201369</td>\n",
              "      <td>-0.121970</td>\n",
              "      <td>-0.043875</td>\n",
              "      <td>-0.113002</td>\n",
              "      <td>-0.061584</td>\n",
              "      <td>...</td>\n",
              "      <td>0.000040</td>\n",
              "      <td>0.000040</td>\n",
              "      <td>0.000040</td>\n",
              "      <td>0.000040</td>\n",
              "      <td>0.000040</td>\n",
              "      <td>1.134016e-04</td>\n",
              "      <td>0.000020</td>\n",
              "      <td>0.000076</td>\n",
              "      <td>7.612341e-05</td>\n",
              "      <td>0.000040</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>3</th>\n",
              "      <td>68869</td>\n",
              "      <td>1</td>\n",
              "      <td>2</td>\n",
              "      <td>-0.114267</td>\n",
              "      <td>-0.184668</td>\n",
              "      <td>-1.201369</td>\n",
              "      <td>0.028105</td>\n",
              "      <td>-0.043875</td>\n",
              "      <td>-0.113002</td>\n",
              "      <td>0.547008</td>\n",
              "      <td>...</td>\n",
              "      <td>0.000027</td>\n",
              "      <td>0.000702</td>\n",
              "      <td>0.000272</td>\n",
              "      <td>0.000088</td>\n",
              "      <td>0.000817</td>\n",
              "      <td>6.113009e-04</td>\n",
              "      <td>0.000714</td>\n",
              "      <td>0.001552</td>\n",
              "      <td>1.552291e-03</td>\n",
              "      <td>0.000817</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>4</th>\n",
              "      <td>89273</td>\n",
              "      <td>1</td>\n",
              "      <td>2</td>\n",
              "      <td>5.202107</td>\n",
              "      <td>-0.210553</td>\n",
              "      <td>-1.756361</td>\n",
              "      <td>-0.121970</td>\n",
              "      <td>260.090707</td>\n",
              "      <td>-0.113002</td>\n",
              "      <td>-0.061584</td>\n",
              "      <td>...</td>\n",
              "      <td>0.074805</td>\n",
              "      <td>0.074805</td>\n",
              "      <td>0.074805</td>\n",
              "      <td>0.074805</td>\n",
              "      <td>0.074805</td>\n",
              "      <td>6.466160e-11</td>\n",
              "      <td>0.003648</td>\n",
              "      <td>0.000022</td>\n",
              "      <td>1.451405e-07</td>\n",
              "      <td>0.074805</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>...</th>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>202799</th>\n",
              "      <td>194747812</td>\n",
              "      <td>49</td>\n",
              "      <td>3</td>\n",
              "      <td>0.558398</td>\n",
              "      <td>-0.198956</td>\n",
              "      <td>-0.091383</td>\n",
              "      <td>-0.121970</td>\n",
              "      <td>-0.043875</td>\n",
              "      <td>-0.113002</td>\n",
              "      <td>-0.061584</td>\n",
              "      <td>...</td>\n",
              "      <td>0.010179</td>\n",
              "      <td>0.010179</td>\n",
              "      <td>0.010179</td>\n",
              "      <td>0.010179</td>\n",
              "      <td>0.010179</td>\n",
              "      <td>8.223464e-04</td>\n",
              "      <td>0.010104</td>\n",
              "      <td>0.019336</td>\n",
              "      <td>1.933576e-02</td>\n",
              "      <td>0.010179</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>202800</th>\n",
              "      <td>194747925</td>\n",
              "      <td>49</td>\n",
              "      <td>3</td>\n",
              "      <td>0.547658</td>\n",
              "      <td>-0.198956</td>\n",
              "      <td>-0.091383</td>\n",
              "      <td>-0.121970</td>\n",
              "      <td>-0.043875</td>\n",
              "      <td>-0.113002</td>\n",
              "      <td>-0.061584</td>\n",
              "      <td>...</td>\n",
              "      <td>0.010029</td>\n",
              "      <td>0.010029</td>\n",
              "      <td>0.010029</td>\n",
              "      <td>0.010029</td>\n",
              "      <td>0.010029</td>\n",
              "      <td>1.012352e-05</td>\n",
              "      <td>0.010098</td>\n",
              "      <td>0.019052</td>\n",
              "      <td>1.905181e-02</td>\n",
              "      <td>0.010029</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>202801</th>\n",
              "      <td>194748063</td>\n",
              "      <td>49</td>\n",
              "      <td>3</td>\n",
              "      <td>0.543600</td>\n",
              "      <td>-0.198853</td>\n",
              "      <td>-0.091383</td>\n",
              "      <td>-0.121970</td>\n",
              "      <td>-0.043875</td>\n",
              "      <td>-0.113002</td>\n",
              "      <td>-0.061584</td>\n",
              "      <td>...</td>\n",
              "      <td>0.009973</td>\n",
              "      <td>0.009973</td>\n",
              "      <td>0.009973</td>\n",
              "      <td>0.009973</td>\n",
              "      <td>0.009973</td>\n",
              "      <td>4.604647e-04</td>\n",
              "      <td>0.009961</td>\n",
              "      <td>0.018945</td>\n",
              "      <td>1.894453e-02</td>\n",
              "      <td>0.009973</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>202802</th>\n",
              "      <td>194748070</td>\n",
              "      <td>49</td>\n",
              "      <td>3</td>\n",
              "      <td>0.537760</td>\n",
              "      <td>-0.198853</td>\n",
              "      <td>-0.091383</td>\n",
              "      <td>-0.121970</td>\n",
              "      <td>-0.043875</td>\n",
              "      <td>-0.113002</td>\n",
              "      <td>-0.061584</td>\n",
              "      <td>...</td>\n",
              "      <td>0.009891</td>\n",
              "      <td>0.009891</td>\n",
              "      <td>0.009891</td>\n",
              "      <td>0.009891</td>\n",
              "      <td>0.009891</td>\n",
              "      <td>1.505606e-04</td>\n",
              "      <td>0.009935</td>\n",
              "      <td>0.018790</td>\n",
              "      <td>1.879015e-02</td>\n",
              "      <td>0.009891</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>202803</th>\n",
              "      <td>194835939</td>\n",
              "      <td>49</td>\n",
              "      <td>3</td>\n",
              "      <td>-0.170463</td>\n",
              "      <td>-0.152788</td>\n",
              "      <td>1.018602</td>\n",
              "      <td>-0.121970</td>\n",
              "      <td>-0.063725</td>\n",
              "      <td>-0.113002</td>\n",
              "      <td>-0.061584</td>\n",
              "      <td>...</td>\n",
              "      <td>0.000035</td>\n",
              "      <td>0.000035</td>\n",
              "      <td>0.000035</td>\n",
              "      <td>0.000035</td>\n",
              "      <td>0.000035</td>\n",
              "      <td>1.987330e-04</td>\n",
              "      <td>0.000035</td>\n",
              "      <td>0.000133</td>\n",
              "      <td>1.332511e-04</td>\n",
              "      <td>0.000035</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "<p>202804 rows × 185 columns</p>\n",
              "</div>\n",
              "      <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-bd5e2093-379c-422f-b749-0aa0254e52c6')\"\n",
              "              title=\"Convert this dataframe to an interactive table.\"\n",
              "              style=\"display:none;\">\n",
              "        \n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "       width=\"24px\">\n",
              "    <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n",
              "    <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n",
              "  </svg>\n",
              "      </button>\n",
              "      \n",
              "  <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      flex-wrap:wrap;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "      <script>\n",
              "        const buttonEl =\n",
              "          document.querySelector('#df-bd5e2093-379c-422f-b749-0aa0254e52c6 button.colab-df-convert');\n",
              "        buttonEl.style.display =\n",
              "          google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "        async function convertToInteractive(key) {\n",
              "          const element = document.querySelector('#df-bd5e2093-379c-422f-b749-0aa0254e52c6');\n",
              "          const dataTable =\n",
              "            await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                     [key], {});\n",
              "          if (!dataTable) return;\n",
              "\n",
              "          const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "            '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "            + ' to learn more about interactive tables.';\n",
              "          element.innerHTML = '';\n",
              "          dataTable['output_type'] = 'display_data';\n",
              "          await google.colab.output.renderOutput(dataTable, element);\n",
              "          const docLink = document.createElement('div');\n",
              "          docLink.innerHTML = docLinkHtml;\n",
              "          element.appendChild(docLink);\n",
              "        }\n",
              "      </script>\n",
              "    </div>\n",
              "  </div>\n",
              "  "
            ]
          },
          "metadata": {},
          "execution_count": 47
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "Data transformation on the augmented features using MinMaxScaler:"
      ],
      "metadata": {
        "id": "r2jcou5LpnWY"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "for column in df_txs_features.columns[168:]:\n",
        "    feature = np.array(df_txs_features[column]).reshape(-1,1)\n",
        "    scaler = MinMaxScaler()\n",
        "    scaler.fit(feature)\n",
        "    feature_scaled = scaler.transform(feature)\n",
        "    df_txs_features[column] = feature_scaled.reshape(1,-1)[0]"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "CI_cLQf1nM9d",
        "outputId": "bcdf0c5a-73db-428d-92b3-689247516202"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "MinMaxScaler()"
            ]
          },
          "metadata": {},
          "execution_count": 48
        },
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "MinMaxScaler()"
            ]
          },
          "metadata": {},
          "execution_count": 48
        },
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "MinMaxScaler()"
            ]
          },
          "metadata": {},
          "execution_count": 48
        },
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "MinMaxScaler()"
            ]
          },
          "metadata": {},
          "execution_count": 48
        },
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "MinMaxScaler()"
            ]
          },
          "metadata": {},
          "execution_count": 48
        },
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "MinMaxScaler()"
            ]
          },
          "metadata": {},
          "execution_count": 48
        },
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "MinMaxScaler()"
            ]
          },
          "metadata": {},
          "execution_count": 48
        },
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "MinMaxScaler()"
            ]
          },
          "metadata": {},
          "execution_count": 48
        },
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "MinMaxScaler()"
            ]
          },
          "metadata": {},
          "execution_count": 48
        },
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "MinMaxScaler()"
            ]
          },
          "metadata": {},
          "execution_count": 48
        },
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "MinMaxScaler()"
            ]
          },
          "metadata": {},
          "execution_count": 48
        },
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "MinMaxScaler()"
            ]
          },
          "metadata": {},
          "execution_count": 48
        },
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "MinMaxScaler()"
            ]
          },
          "metadata": {},
          "execution_count": 48
        },
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "MinMaxScaler()"
            ]
          },
          "metadata": {},
          "execution_count": 48
        },
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "MinMaxScaler()"
            ]
          },
          "metadata": {},
          "execution_count": 48
        },
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "MinMaxScaler()"
            ]
          },
          "metadata": {},
          "execution_count": 48
        },
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "MinMaxScaler()"
            ]
          },
          "metadata": {},
          "execution_count": 48
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "# remove 'unknown' transactions\n",
        "data = df_txs_features.loc[(df_txs_features['class'] != 3), 'txId']\n",
        "df_txs_features_selected = df_txs_features.loc[df_txs_features['txId'].isin(data)]\n",
        "df_txs_features_selected"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 678
        },
        "id": "v2fi5xwanMsH",
        "outputId": "d5aabff3-cfb0-4194-9062-e53dbf7fa487"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "             txId  Time step  class  Local_feature_1  Local_feature_2  \\\n",
              "3           68869          1      2        -0.114267        -0.184668   \n",
              "4           89273          1      2         5.202107        -0.210553   \n",
              "11         293323          1      2        -0.172726        -0.184668   \n",
              "22        1494462          1      2        -0.172921        -0.158783   \n",
              "25        1582950          1      2        -0.169967        -0.184668   \n",
              "...           ...        ...    ...              ...              ...   \n",
              "202762  194334585         49      2        -0.039416        -0.118083   \n",
              "202763  194334621         49      2        -0.050308        -0.112834   \n",
              "202764  194335206         49      2        -0.154605        -0.116753   \n",
              "202765  194335216         49      2         0.708000        -0.118083   \n",
              "202766  194345639         49      2         0.703311        -0.120152   \n",
              "\n",
              "        Local_feature_3  Local_feature_4  Local_feature_5  Local_feature_6  \\\n",
              "3             -1.201369         0.028105        -0.043875        -0.113002   \n",
              "4             -1.756361        -0.121970       260.090707        -0.113002   \n",
              "11            -1.201369        -0.121970        -0.043875        -0.113002   \n",
              "22            -1.201369        -0.121970        -0.043875        -0.113002   \n",
              "25            -1.201369        -0.121970        -0.043875        -0.113002   \n",
              "...                 ...              ...              ...              ...   \n",
              "202762         1.018602        -0.121970        -0.043875        -0.113002   \n",
              "202763         1.018602        -0.121970        -0.043875        -0.113002   \n",
              "202764         1.018602        -0.121970        -0.043875        -0.113002   \n",
              "202765         1.018602        -0.121970        -0.043875        -0.113002   \n",
              "202766         1.018602        -0.121970        -0.043875        -0.113002   \n",
              "\n",
              "        Local_feature_7  ...    in_BTC_min    in_BTC_max   in_BTC_mean  \\\n",
              "3              0.547008  ...  2.711586e-05  7.022548e-04  2.723834e-04   \n",
              "4             -0.061584  ...  7.480472e-02  7.480472e-02  7.480472e-02   \n",
              "11            -0.061584  ...  3.579195e-06  3.577994e-06  3.578006e-06   \n",
              "22            -0.061584  ...  8.778192e-07  8.766174e-07  8.766298e-07   \n",
              "25            -0.061584  ...  4.198409e-05  4.198289e-05  4.198290e-05   \n",
              "...                 ...  ...           ...           ...           ...   \n",
              "202762        -0.061584  ...  1.858865e-03  1.858864e-03  1.858864e-03   \n",
              "202763        -0.061584  ...  1.707288e-03  1.707287e-03  1.707287e-03   \n",
              "202764        -0.061584  ...  2.557984e-04  2.557972e-04  2.557972e-04   \n",
              "202765        -0.061584  ...  1.226060e-02  1.226060e-02  1.226060e-02   \n",
              "202766        -0.061584  ...  1.219534e-02  1.219534e-02  1.219534e-02   \n",
              "\n",
              "        in_BTC_median  in_BTC_total   out_BTC_min   out_BTC_max  out_BTC_mean  \\\n",
              "3        8.778200e-05  8.171503e-04  6.113009e-04  7.142783e-04      0.001552   \n",
              "4        7.480472e-02  7.480472e-02  6.466160e-11  3.647866e-03      0.000022   \n",
              "11       3.579195e-06  3.575573e-06  4.715323e-07  3.511341e-06      0.000007   \n",
              "22       8.778192e-07  8.741973e-07  1.442451e-06  6.094506e-07      0.000002   \n",
              "25       4.198409e-05  4.198047e-05  2.302948e-05  3.817869e-05      0.000080   \n",
              "...               ...           ...           ...           ...           ...   \n",
              "202762   1.858865e-03  1.858862e-03  1.992449e-05  1.868443e-03      0.003531   \n",
              "202763   1.707288e-03  1.707285e-03  4.973969e-06  1.718449e-03      0.003243   \n",
              "202764   2.557984e-04  2.557948e-04  1.933202e-04  2.232165e-04      0.000486   \n",
              "202765   1.226060e-02  1.226060e-02  5.065905e-05  1.233830e-02      0.023291   \n",
              "202766   1.219534e-02  1.219534e-02  6.963557e-05  1.226921e-02      0.023167   \n",
              "\n",
              "        out_BTC_median  out_BTC_total  \n",
              "3         1.552291e-03   8.171446e-04  \n",
              "4         1.451405e-07   7.480473e-02  \n",
              "11        6.780735e-06   3.569892e-06  \n",
              "22        1.632382e-06   8.597370e-07  \n",
              "25        7.973672e-05   4.197479e-05  \n",
              "...                ...            ...  \n",
              "202762    3.531138e-03   1.858833e-03  \n",
              "202763    3.243191e-03   1.707255e-03  \n",
              "202764    4.858661e-04   2.557661e-04  \n",
              "202765    2.329083e-02   1.226057e-02  \n",
              "202766    2.316686e-02   1.219531e-02  \n",
              "\n",
              "[46045 rows x 185 columns]"
            ],
            "text/html": [
              "\n",
              "  <div id=\"df-3ef8f0d1-7e47-4d48-b357-f639adbc9a1d\">\n",
              "    <div class=\"colab-df-container\">\n",
              "      <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>txId</th>\n",
              "      <th>Time step</th>\n",
              "      <th>class</th>\n",
              "      <th>Local_feature_1</th>\n",
              "      <th>Local_feature_2</th>\n",
              "      <th>Local_feature_3</th>\n",
              "      <th>Local_feature_4</th>\n",
              "      <th>Local_feature_5</th>\n",
              "      <th>Local_feature_6</th>\n",
              "      <th>Local_feature_7</th>\n",
              "      <th>...</th>\n",
              "      <th>in_BTC_min</th>\n",
              "      <th>in_BTC_max</th>\n",
              "      <th>in_BTC_mean</th>\n",
              "      <th>in_BTC_median</th>\n",
              "      <th>in_BTC_total</th>\n",
              "      <th>out_BTC_min</th>\n",
              "      <th>out_BTC_max</th>\n",
              "      <th>out_BTC_mean</th>\n",
              "      <th>out_BTC_median</th>\n",
              "      <th>out_BTC_total</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>3</th>\n",
              "      <td>68869</td>\n",
              "      <td>1</td>\n",
              "      <td>2</td>\n",
              "      <td>-0.114267</td>\n",
              "      <td>-0.184668</td>\n",
              "      <td>-1.201369</td>\n",
              "      <td>0.028105</td>\n",
              "      <td>-0.043875</td>\n",
              "      <td>-0.113002</td>\n",
              "      <td>0.547008</td>\n",
              "      <td>...</td>\n",
              "      <td>2.711586e-05</td>\n",
              "      <td>7.022548e-04</td>\n",
              "      <td>2.723834e-04</td>\n",
              "      <td>8.778200e-05</td>\n",
              "      <td>8.171503e-04</td>\n",
              "      <td>6.113009e-04</td>\n",
              "      <td>7.142783e-04</td>\n",
              "      <td>0.001552</td>\n",
              "      <td>1.552291e-03</td>\n",
              "      <td>8.171446e-04</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>4</th>\n",
              "      <td>89273</td>\n",
              "      <td>1</td>\n",
              "      <td>2</td>\n",
              "      <td>5.202107</td>\n",
              "      <td>-0.210553</td>\n",
              "      <td>-1.756361</td>\n",
              "      <td>-0.121970</td>\n",
              "      <td>260.090707</td>\n",
              "      <td>-0.113002</td>\n",
              "      <td>-0.061584</td>\n",
              "      <td>...</td>\n",
              "      <td>7.480472e-02</td>\n",
              "      <td>7.480472e-02</td>\n",
              "      <td>7.480472e-02</td>\n",
              "      <td>7.480472e-02</td>\n",
              "      <td>7.480472e-02</td>\n",
              "      <td>6.466160e-11</td>\n",
              "      <td>3.647866e-03</td>\n",
              "      <td>0.000022</td>\n",
              "      <td>1.451405e-07</td>\n",
              "      <td>7.480473e-02</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>11</th>\n",
              "      <td>293323</td>\n",
              "      <td>1</td>\n",
              "      <td>2</td>\n",
              "      <td>-0.172726</td>\n",
              "      <td>-0.184668</td>\n",
              "      <td>-1.201369</td>\n",
              "      <td>-0.121970</td>\n",
              "      <td>-0.043875</td>\n",
              "      <td>-0.113002</td>\n",
              "      <td>-0.061584</td>\n",
              "      <td>...</td>\n",
              "      <td>3.579195e-06</td>\n",
              "      <td>3.577994e-06</td>\n",
              "      <td>3.578006e-06</td>\n",
              "      <td>3.579195e-06</td>\n",
              "      <td>3.575573e-06</td>\n",
              "      <td>4.715323e-07</td>\n",
              "      <td>3.511341e-06</td>\n",
              "      <td>0.000007</td>\n",
              "      <td>6.780735e-06</td>\n",
              "      <td>3.569892e-06</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>22</th>\n",
              "      <td>1494462</td>\n",
              "      <td>1</td>\n",
              "      <td>2</td>\n",
              "      <td>-0.172921</td>\n",
              "      <td>-0.158783</td>\n",
              "      <td>-1.201369</td>\n",
              "      <td>-0.121970</td>\n",
              "      <td>-0.043875</td>\n",
              "      <td>-0.113002</td>\n",
              "      <td>-0.061584</td>\n",
              "      <td>...</td>\n",
              "      <td>8.778192e-07</td>\n",
              "      <td>8.766174e-07</td>\n",
              "      <td>8.766298e-07</td>\n",
              "      <td>8.778192e-07</td>\n",
              "      <td>8.741973e-07</td>\n",
              "      <td>1.442451e-06</td>\n",
              "      <td>6.094506e-07</td>\n",
              "      <td>0.000002</td>\n",
              "      <td>1.632382e-06</td>\n",
              "      <td>8.597370e-07</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>25</th>\n",
              "      <td>1582950</td>\n",
              "      <td>1</td>\n",
              "      <td>2</td>\n",
              "      <td>-0.169967</td>\n",
              "      <td>-0.184668</td>\n",
              "      <td>-1.201369</td>\n",
              "      <td>-0.121970</td>\n",
              "      <td>-0.043875</td>\n",
              "      <td>-0.113002</td>\n",
              "      <td>-0.061584</td>\n",
              "      <td>...</td>\n",
              "      <td>4.198409e-05</td>\n",
              "      <td>4.198289e-05</td>\n",
              "      <td>4.198290e-05</td>\n",
              "      <td>4.198409e-05</td>\n",
              "      <td>4.198047e-05</td>\n",
              "      <td>2.302948e-05</td>\n",
              "      <td>3.817869e-05</td>\n",
              "      <td>0.000080</td>\n",
              "      <td>7.973672e-05</td>\n",
              "      <td>4.197479e-05</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>...</th>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>202762</th>\n",
              "      <td>194334585</td>\n",
              "      <td>49</td>\n",
              "      <td>2</td>\n",
              "      <td>-0.039416</td>\n",
              "      <td>-0.118083</td>\n",
              "      <td>1.018602</td>\n",
              "      <td>-0.121970</td>\n",
              "      <td>-0.043875</td>\n",
              "      <td>-0.113002</td>\n",
              "      <td>-0.061584</td>\n",
              "      <td>...</td>\n",
              "      <td>1.858865e-03</td>\n",
              "      <td>1.858864e-03</td>\n",
              "      <td>1.858864e-03</td>\n",
              "      <td>1.858865e-03</td>\n",
              "      <td>1.858862e-03</td>\n",
              "      <td>1.992449e-05</td>\n",
              "      <td>1.868443e-03</td>\n",
              "      <td>0.003531</td>\n",
              "      <td>3.531138e-03</td>\n",
              "      <td>1.858833e-03</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>202763</th>\n",
              "      <td>194334621</td>\n",
              "      <td>49</td>\n",
              "      <td>2</td>\n",
              "      <td>-0.050308</td>\n",
              "      <td>-0.112834</td>\n",
              "      <td>1.018602</td>\n",
              "      <td>-0.121970</td>\n",
              "      <td>-0.043875</td>\n",
              "      <td>-0.113002</td>\n",
              "      <td>-0.061584</td>\n",
              "      <td>...</td>\n",
              "      <td>1.707288e-03</td>\n",
              "      <td>1.707287e-03</td>\n",
              "      <td>1.707287e-03</td>\n",
              "      <td>1.707288e-03</td>\n",
              "      <td>1.707285e-03</td>\n",
              "      <td>4.973969e-06</td>\n",
              "      <td>1.718449e-03</td>\n",
              "      <td>0.003243</td>\n",
              "      <td>3.243191e-03</td>\n",
              "      <td>1.707255e-03</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>202764</th>\n",
              "      <td>194335206</td>\n",
              "      <td>49</td>\n",
              "      <td>2</td>\n",
              "      <td>-0.154605</td>\n",
              "      <td>-0.116753</td>\n",
              "      <td>1.018602</td>\n",
              "      <td>-0.121970</td>\n",
              "      <td>-0.043875</td>\n",
              "      <td>-0.113002</td>\n",
              "      <td>-0.061584</td>\n",
              "      <td>...</td>\n",
              "      <td>2.557984e-04</td>\n",
              "      <td>2.557972e-04</td>\n",
              "      <td>2.557972e-04</td>\n",
              "      <td>2.557984e-04</td>\n",
              "      <td>2.557948e-04</td>\n",
              "      <td>1.933202e-04</td>\n",
              "      <td>2.232165e-04</td>\n",
              "      <td>0.000486</td>\n",
              "      <td>4.858661e-04</td>\n",
              "      <td>2.557661e-04</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>202765</th>\n",
              "      <td>194335216</td>\n",
              "      <td>49</td>\n",
              "      <td>2</td>\n",
              "      <td>0.708000</td>\n",
              "      <td>-0.118083</td>\n",
              "      <td>1.018602</td>\n",
              "      <td>-0.121970</td>\n",
              "      <td>-0.043875</td>\n",
              "      <td>-0.113002</td>\n",
              "      <td>-0.061584</td>\n",
              "      <td>...</td>\n",
              "      <td>1.226060e-02</td>\n",
              "      <td>1.226060e-02</td>\n",
              "      <td>1.226060e-02</td>\n",
              "      <td>1.226060e-02</td>\n",
              "      <td>1.226060e-02</td>\n",
              "      <td>5.065905e-05</td>\n",
              "      <td>1.233830e-02</td>\n",
              "      <td>0.023291</td>\n",
              "      <td>2.329083e-02</td>\n",
              "      <td>1.226057e-02</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>202766</th>\n",
              "      <td>194345639</td>\n",
              "      <td>49</td>\n",
              "      <td>2</td>\n",
              "      <td>0.703311</td>\n",
              "      <td>-0.120152</td>\n",
              "      <td>1.018602</td>\n",
              "      <td>-0.121970</td>\n",
              "      <td>-0.043875</td>\n",
              "      <td>-0.113002</td>\n",
              "      <td>-0.061584</td>\n",
              "      <td>...</td>\n",
              "      <td>1.219534e-02</td>\n",
              "      <td>1.219534e-02</td>\n",
              "      <td>1.219534e-02</td>\n",
              "      <td>1.219534e-02</td>\n",
              "      <td>1.219534e-02</td>\n",
              "      <td>6.963557e-05</td>\n",
              "      <td>1.226921e-02</td>\n",
              "      <td>0.023167</td>\n",
              "      <td>2.316686e-02</td>\n",
              "      <td>1.219531e-02</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "<p>46045 rows × 185 columns</p>\n",
              "</div>\n",
              "      <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-3ef8f0d1-7e47-4d48-b357-f639adbc9a1d')\"\n",
              "              title=\"Convert this dataframe to an interactive table.\"\n",
              "              style=\"display:none;\">\n",
              "        \n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "       width=\"24px\">\n",
              "    <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n",
              "    <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n",
              "  </svg>\n",
              "      </button>\n",
              "      \n",
              "  <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      flex-wrap:wrap;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "      <script>\n",
              "        const buttonEl =\n",
              "          document.querySelector('#df-3ef8f0d1-7e47-4d48-b357-f639adbc9a1d button.colab-df-convert');\n",
              "        buttonEl.style.display =\n",
              "          google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "        async function convertToInteractive(key) {\n",
              "          const element = document.querySelector('#df-3ef8f0d1-7e47-4d48-b357-f639adbc9a1d');\n",
              "          const dataTable =\n",
              "            await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                     [key], {});\n",
              "          if (!dataTable) return;\n",
              "\n",
              "          const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "            '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "            + ' to learn more about interactive tables.';\n",
              "          element.innerHTML = '';\n",
              "          dataTable['output_type'] = 'display_data';\n",
              "          await google.colab.output.renderOutput(dataTable, element);\n",
              "          const docLink = document.createElement('div');\n",
              "          docLink.innerHTML = docLinkHtml;\n",
              "          element.appendChild(docLink);\n",
              "        }\n",
              "      </script>\n",
              "    </div>\n",
              "  </div>\n",
              "  "
            ]
          },
          "metadata": {},
          "execution_count": 49
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "Split the data into training and testing sets with respect to time steps.\n",
        "\n",
        "**Training set**: Time steps 1 to 34\n",
        "\n",
        "**Testing set**: Time steps 35 to 49"
      ],
      "metadata": {
        "id": "1yHvfgghreFy"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "# Goal: binary classification of 0,1\n",
        "# 0: licit, 1: illicit\n",
        "\n",
        "X_data = df_txs_features_selected.loc[(df_txs_features_selected['Time step'] < 35) & (df_txs_features_selected['class'] != 3), 'txId']\n",
        "X_training_timesteps = df_txs_features_selected.loc[df_txs_features_selected['txId'].isin(X_data)]\n",
        "X_train = X_training_timesteps.drop(columns=['txId', 'class', 'Time step'])\n",
        "\n",
        "X_data_test = df_txs_features_selected.loc[(df_txs_features_selected['Time step'] >= 35) & (df_txs_features_selected['class'] != 3), 'txId']\n",
        "X_testing_timesteps = df_txs_features_selected.loc[df_txs_features_selected['txId'].isin(X_data_test)]\n",
        "X_test = X_testing_timesteps.drop(columns=['txId', 'class', 'Time step'])\n",
        "\n",
        "y_training_timesteps = X_training_timesteps[['class']]\n",
        "y_training_timesteps = y_training_timesteps['class'].apply(lambda x: 0 if x == 2 else 1 ) # change illicit (class-2) to '0' for classification\n",
        "y_train = y_training_timesteps\n",
        "\n",
        "y_testing_timesteps = X_testing_timesteps[['class']]\n",
        "y_testing_timesteps = y_testing_timesteps['class'].apply(lambda x: 0 if x == 2 else 1 ) # change illicit (class-2) to '0' for classification\n",
        "y_test = y_testing_timesteps"
      ],
      "metadata": {
        "id": "Nps2xBp_qakQ"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "Run classifiers (LR, RF, MLP, XGB):"
      ],
      "metadata": {
        "id": "px1WEceJs0jm"
      }
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "u3Nc_MN0HOB2",
        "outputId": "be0dd97e-f9ad-4d4a-e1d3-283a8b37509e"
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Logistic Regression\n",
            "Precision: 0.328 \n",
            "Recall: 0.707 \n",
            "F1 Score: 0.448\n",
            "Micro-Average F1 Score: 0.884\n"
          ]
        }
      ],
      "source": [
        "# LOGISTIC REGRESSION (LR)\n",
        "cLR = LogisticRegression(max_iter=1000).fit(X_train.values,y_train.values)\n",
        "y_preds_LR = cLR.predict(X_test.values)\n",
        "prec,rec,f1,num = precision_recall_fscore_support(y_test.values, y_preds_LR)\n",
        "\n",
        "print(\"Logistic Regression\")\n",
        "print(\"Precision: %.3f \\nRecall: %.3f \\nF1 Score: %.3f\"%(prec[1],rec[1],f1[1]))\n",
        "micro_f1 = f1_score(y_test, y_preds_LR, average='micro')\n",
        "print(\"Micro-Average F1 Score: %.3f\"%(micro_f1))"
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "# RANDOM FOREST (RF)\n",
        "cRF = RandomForestClassifier(n_estimators=50).fit(X_train.values,y_train.values)\n",
        "y_preds_RF = cRF.predict(X_test.values)\n",
        "prec,rec,f1,num = precision_recall_fscore_support(y_test.values, y_preds_RF)\n",
        "\n",
        "print(\"Random Forest\")\n",
        "print(\"Precision: %.3f \\nRecall: %.3f \\nF1 Score: %.3f\"%(prec[1],rec[1],f1[1]))\n",
        "micro_f1 = f1_score(y_test, y_preds_RF, average='micro')\n",
        "print(\"Micro-Average F1 Score: %.3f\"%(micro_f1))"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "Dd5wTNsFtF8i",
        "outputId": "cc87ed75-a19b-436a-eeac-4d92f62333cd"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Random Forest\n",
            "Precision: 0.975 \n",
            "Recall: 0.719 \n",
            "F1 Score: 0.828\n",
            "Micro-Average F1 Score: 0.980\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "71pdefvzH-pj",
        "outputId": "d3ec0e0a-b7e9-4264-acb8-e905a6b1f596"
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Multilayer Perceptron (MLP)\n",
            "Precision: 0.611 \n",
            "Recall: 0.613 \n",
            "F1 Score: 0.612\n",
            "Micro-Average F1 Score: 0.949\n"
          ]
        }
      ],
      "source": [
        "# MULTILAYER PERCEPTRON (MLP)\n",
        "cMLP = MLPClassifier(solver='adam', learning_rate_init=0.001, max_iter=200).fit(X_train.values,y_train.values)\n",
        "y_preds_MLP = cMLP.predict(X_test.values)\n",
        "prec,rec,f1,num = precision_recall_fscore_support(y_test.values, y_preds_MLP)\n",
        "\n",
        "print(\"Multilayer Perceptron (MLP)\")\n",
        "print(\"Precision: %.3f \\nRecall: %.3f \\nF1 Score: %.3f\"%(prec[1],rec[1],f1[1]))\n",
        "micro_f1 = f1_score(y_test, y_preds_MLP, average='micro')\n",
        "print(\"Micro-Average F1 Score: %.3f\"%(micro_f1))"
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "# XGBOOST (XGB)\n",
        "cXGB = xgb.XGBClassifier(objective=\"multi:softmax\", num_class=2, random_state=42)\n",
        "cXGB.fit(X_train.values, y_train.values)\n",
        "y_preds_XGB = cXGB.predict(X_test.values)\n",
        "prec,rec,f1,num = precision_recall_fscore_support(y_test.values, y_preds_XGB)\n",
        "\n",
        "print(\"XGBOOST\")\n",
        "print(\"Precision: %.3f \\nRecall: %.3f \\nF1 Score: %.3f\"%(prec[1],rec[1],f1[1]))\n",
        "micro_f1 = f1_score(y_test, y_preds_XGB, average='micro')\n",
        "print(\"Micro-Average F1 Score: %.3f\"%(micro_f1))\n",
        "#print(confusion_matrix(y, y_pred))"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "q-TF75xyHVMm",
        "outputId": "36dd6043-f64c-4d0a-af88-cd6a52e05a29"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "XGBOOST\n",
            "Precision: 0.793 \n",
            "Recall: 0.718 \n",
            "F1 Score: 0.754\n",
            "Micro-Average F1 Score: 0.969\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "Run ensemble classifiers (RF+MLP, RF+XGB, MLP+XGB, RF+MLP+XGB):"
      ],
      "metadata": {
        "id": "tHTu-iSgyf-u"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "#create a dictionary of our models\n",
        "estimatorsXGBRF=[('RF', cRF), ('XGB', cXGB)]\n",
        "#create our voting classifier, inputting our models\n",
        "ensembleXGBRF = VotingClassifier(estimatorsXGBRF, voting='hard')\n",
        "ensembleXGBRF.fit(X_train.values, y_train.values)\n",
        "y_preds_XGBRF = ensembleXGBRF.predict(X_test.values)\n",
        "prec,rec,f1,num = precision_recall_fscore_support(y_test.values, y_preds_XGBRF)\n",
        "\n",
        "print(\"Ensemble: XGBoost (XGB) + Random Forest (RF)\")\n",
        "print(\"Precision: %.3f \\nRecall: %.3f \\nF1 Score: %.3f\"%(prec[1],rec[1],f1[1]))\n",
        "micro_f1 = f1_score(y_test, y_preds_XGBRF, average='micro')\n",
        "print(\"Micro-Average F1 Score: %.3f\"%(micro_f1))"
      ],
      "metadata": {
        "id": "rE-LCNAUv2bG",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "493f325d-737e-4ed0-f8b5-ea566dac26a5"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Ensemble: XGBoost (XGB) + Random Forest (RF)\n",
            "Precision: 0.977 \n",
            "Recall: 0.706 \n",
            "F1 Score: 0.820\n",
            "Micro-Average F1 Score: 0.979\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "#create a dictionary of our models\n",
        "estimatorsMLPXGB=[('MLP', cMLP), ('XGB', cXGB)]\n",
        "#create our voting classifier, inputting our models\n",
        "ensembleMLPXGB = VotingClassifier(estimatorsMLPXGB, voting='hard')\n",
        "ensembleMLPXGB.fit(X_train.values, y_train.values)\n",
        "y_preds_MLPXGB = ensembleMLPXGB.predict(X_test.values)\n",
        "prec,rec,f1,num = precision_recall_fscore_support(y_test.values, y_preds_MLPXGB)\n",
        "\n",
        "print(\"Ensemble: Multilayer Perceptron (MLP) + XGBoost\")\n",
        "print(\"Precision: %.3f \\nRecall: %.3f \\nF1 Score: %.3f\"%(prec[1],rec[1],f1[1]))\n",
        "micro_f1 = f1_score(y_test, y_preds_MLPXGB, average='micro')\n",
        "print(\"Micro-Average F1 Score: %.3f\"%(micro_f1))"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "ExY9eA20WXgZ",
        "outputId": "ef4a3c50-8017-4319-f7e3-7dc581fb2bc5"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Ensemble: Multilayer Perceptron (MLP) + XGBoost\n",
            "Precision: 0.974 \n",
            "Recall: 0.596 \n",
            "F1 Score: 0.739\n",
            "Micro-Average F1 Score: 0.972\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "#create a dictionary of our models\n",
        "estimatorsRFMLP=[('MLP', cMLP), ('RF', cRF)]\n",
        "#create our voting classifier, inputting our models\n",
        "ensembleRFMLP = VotingClassifier(estimatorsRFMLP, voting='hard')\n",
        "ensembleRFMLP.fit(X_train.values, y_train.values)\n",
        "y_preds_RFMLP = ensembleRFMLP.predict(X_test.values)\n",
        "prec,rec,f1,num = precision_recall_fscore_support(y_test.values, y_preds_RFMLP)\n",
        "\n",
        "print(\"Ensemble: Random Forest (RF) + Multilayer Perceptron (MLP)\")\n",
        "print(\"Precision: %.3f \\nRecall: %.3f \\nF1 Score: %.3f\"%(prec[1],rec[1],f1[1]))\n",
        "micro_f1 = f1_score(y_test, y_preds_RFMLP, average='micro')\n",
        "print(\"Micro-Average F1 Score: %.3f\"%(micro_f1))"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "t8u6s6vOmZLn",
        "outputId": "248d51ba-89be-4cd3-a0c5-c7d7819e21d4"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Ensemble: Random Forest (RF) + Multilayer Perceptron (MLP)\n",
            "Precision: 0.989 \n",
            "Recall: 0.635 \n",
            "F1 Score: 0.773\n",
            "Micro-Average F1 Score: 0.975\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "#create a dictionary of our models\n",
        "estimatorsXGBRFMLP=[('XGB', cXGB), ('MLP', cMLP), ('RF', cRF)]\n",
        "#create our voting classifier, inputting our models\n",
        "ensembleXGBRFMLP = VotingClassifier(estimatorsXGBRFMLP, voting='hard')\n",
        "ensembleXGBRFMLP.fit(X_train.values, y_train.values)\n",
        "y_preds_XGBRFMLP = ensembleXGBRFMLP.predict(X_test.values)\n",
        "prec,rec,f1,num = precision_recall_fscore_support(y_test.values, y_preds_XGBRFMLP)\n",
        "\n",
        "print(\"Ensemble (all): XGBoost + Random Forest (RF) + Multilayer Perceptron (MLP)\")\n",
        "print(\"Precision: %.3f \\nRecall: %.3f \\nF1 Score: %.3f\"%(prec[1],rec[1],f1[1]))\n",
        "micro_f1 = f1_score(y_test, y_preds_XGBRFMLP, average='micro')\n",
        "print(\"Micro-Average F1 Score: %.3f\"%(micro_f1))"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "RVPMNWtLXpJ9",
        "outputId": "5c25567d-29aa-46b3-a6e2-987cdfd3658c"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Ensemble (all): XGBoost + Random Forest (RF) + Multilayer Perceptron (MLP)\n",
            "Precision: 0.962 \n",
            "Recall: 0.723 \n",
            "F1 Score: 0.826\n",
            "Micro-Average F1 Score: 0.980\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "(LSTM not included)"
      ],
      "metadata": {
        "id": "dNR2aTas1Ej3"
      }
    },
    {
      "cell_type": "code",
      "source": [],
      "metadata": {
        "id": "LRcjdSBwqaZ1"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "# **Acknowledgements**\n",
        "\n",
        "\n",
        "---\n",
        "---\n",
        "\n",
        "\n",
        "Released by: Youssef Elmougy, Ling Liu\n",
        "\n",
        "\n",
        "\n",
        "School of Computer Science, Georgia Institute of Technology\n",
        "\n",
        "Contact: yelmougy3@gatech.edu\n",
        "\n",
        "\n",
        "---\n",
        "\n",
        "Github Repository: [https://www.github.com/git-disl/EllipticPlusPlus](https://www.github.com/git-disl/EllipticPlusPlus)\n",
        "\n",
        "\n",
        "If you use our dataset in your work, please cite our paper:\n",
        "\n",
        "\n",
        "\n",
        "\n",
        "\n",
        ">> Youssef Elmougy and Ling Liu. 2023. Demystifying Fraudulent Transactions and Illicit Nodes in the Bitcoin Network for Financial Forensics.\n",
        "\n",
        "---\n",
        "\n"
      ],
      "metadata": {
        "id": "BwrFHYfy5hrz"
      }
    }
  ]
}